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Minimal Evacuation Times and Stability 



Abstract 

We consider a system where packets (jobs) arrive for processing using one of the policies in a given class. We study the 
connection between the minimal evacuation times and the stability region of the system under the given class of policies. The result 
is used to establish the equality of information theoretic capacity region and system stability region for the multiuser broadcast 
erasure channel with feedback. 



I. Introduction 

1 In this work we consider a time slotted system where packets arrive to one of n different input queues - there may be other 
t— | . system queues to which packets are placed during their processing. The packets are processed by a policy from an admissible 
<0 class. We study the connection between system stability and minimal evacuation time, i.e. the time it takes to complete 
processing a number of packets placed at the input queues at time 0, provided that no further arrivals occur afterwards. Under 
£> 1 certain general assumptions on admissible policies and system statistics, it is shown that the stability region of the system 
O , is completely characterized by the asymptotic growth rate of minimal evacuation time. We make very few assumptions on 
1 the system structure and hence the result is applicable to a large number of applications in communications as well as more 
. general control systems. However, we point out that the result, while intuitive, has to be applied with caution since there are 

■ systems for which its application leads to wrong conclusions. As an application to our methodology, we consider the iV-user 
broadcast erasure channel with feedback. In this setup, we compare the information theoretic capacity region with the stability 

'■ region and show that they are equal. 

Concepts akin to evacuation time and their relation to stability have been investigated in earlier works. Baccelli and Foss 
(T) consider a system fed by a marked point process and operating under a given policy. The concept of dater is used to 
(j , describe the time of last activity in the system, if the system is fed only by the ?nth to nth , m < n of the points of the 
marked process. Assuming that the dater is a deterministic function of the arrival times and the marks of the point process, 
and under additional assumption on dater sample paths, they show that stability under the specified policy is characterized by 

■ the asymptotic behavior of daters. These results are extended to continuous time input processes by Altman J2). In our setup, 
. the system evolution may depend on random factors as well as the characteristics of the arrival process. Moreover, we do not 

make sample path assumptions on specific policies. We rather specify features that admissible policies may have, and based 
on these we characterize the stability region of the class of admissible policies by the asymptotic growth rate of minimal (over 
all admissible policies) evacuation times. 

A different, yet related, methodology is developed by Meyn (3); the workload w(t) is defined as the time the server must 
work to clear all of the inventory of the system at time t when operating in the fluid limit. This basic concept is elaborated 
and used to derive significant results and obtain intuition for good control policies in specific complex networks. The concept 
of workload is closely related to the evacuation time, however we make minimal assumptions on system structure and the 
derived results are applicable to more general systems. 

Regarding the relation between the information theoretic capacity and queueing theoretic stability regions, the equality of 



X 



?H '. these has been shown recently in J4] for systems without feedback. The system studied in this work uses feedback, and as 
will be seen it can be derived in a simple manner based on stability characterization through evacuation time. 



A. Preliminaries 

In the following, we use the vector notation x = [x±,X2, x n ] T . Also x > y means Xi >yi,i~ 1, 2..., n and 

\x] = [\x{] \Xn}) , 

where \x~\ is the least integer larger than or equal to x. With m, k we denote vectors with nonnegative integer coordinates 
and with r, s vectors with nonnegative real number coordinates. 



II. System Model and Admissible Policies 

We consider a time-slotted system where slot t = 0,1, ... corresponds to the time interval [t, t + 1). The system has n 
input queues of infinite length where packet^] arrive. Packets arriving at each input may have certain properties, e.g., service 
times, priorities, routing options, etc. There may be additional queues in the system, where packets may be placed during its 
operation. At the beginning of time slot t, i.e., at time t, A\ (t) packets arrive at input i. (In particular, we use Ai(0) to denote 

1 In this work we use the term packet, that describes an arriving unit in a communication network. However, our work applies to any general service system 
with arrival processes and queues, e.g. manufacturing systems, road networks, network switches, etc. Therefore, the subsequent discussion and results should 
be understood generically. 
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the number of packets in the queue of input i when the system commences operation at t = 0.) We assume that the arrival 
processes satisfy the ergodicity condition 

V* A ■ (t) 

lim ^ T =° ' v ' = Xi > 0, i = l,2,...,n (1) 



t 

as well as, 

E 

lim 



eU^ (*) 



Ai, t = l,2,...,n (2) 

The operation of the system is characterized by a finite set of system states S, and control sets Q s for each s £ S: if at the 
beginning of a slot the system state is s £ S, one of the available controls g £ Q s is applied. There may be randomness in 
the behavior of the system, that is, given s and g at the beginning of a slot, the system state and the results at the end of a 
slot (e.g. packet erasures) may be random. For example, this makes the model particularly useful in wireless networks, where 
outcomes of transmissions may depend on channel state and ambient noise. 

Arriving packets are processed by the system following a policy tt, belonging to a class of admissible policies IT. At time 
t, when the system state is s, an admissible policy specifies: 

1) The control g £ Q s to be chosen. 

2) An action a among a set of available actions A g when control g is chosen. An action specifies how packets are handled 
within the system. 

The choice of controls and actions depends on the "system history" up to t, denoted by H t - The history Ht includes all 
information about packet arrival instants, packet departure instants, system states, controls, actions taken and results, up to and 
including time t. 

Note that in the mathematical analysis of systems, the "state" of the mathematical model may include part of Ht, and actions 
are usually not distinguished from controls. For the purposes of this work, the terms system states and controls are explicitly 
used to refer to the operational characteristics of the system, and are distinct from the history Ht and actions taken once 
the system characteristics are set. For example, the sizes of the queues at time t should usually be considered as part of the 
information captured by Ht, rather than the system state, unless the queue states directly impact the set of controls available to 
the system. Also, we emphasize that the choice of one action or another within a given control (for example, which particular 
packet is transmitted from a given queue) does not affect the system state or slot outcome. This distinction is needed in order 
to define well the statistical assumptions needed for the development that follows. We next present several examples to clarify 
these notions. 



Example 1. Assume a wireless transmitter which can transmit to a destination over one of two transmission channels, I or 
II (e.g. over two different carriers). Data arriving at the transmitter is classified in two types A, B. Packets from each of the 
classes are placed in distinct infinite size queues. 

The channels can be in one of four states, (si,S2) £ {(l,h),(h,l),(l,l),(h,h)}. The controls Q s available when in state 
s = (si,s 2 ) determine a) the channel to be used for transmission, and b) the transmission power p. This choice determines 
the rate or transmission r(p, s) in packets per second over the chosen channel. Once a control g is chosen, the action set A g 
consists of two elements, a a and as indicating the type of data to be transmitted over the chosen channel. The choice of 
action does not make a difference to the dynamics of the system state. 



Example 2. Consider a communication system consisting of two nodes, a, b. Arriving packets are stored in an infinite queue 
at node a and must be delivered to node b. The two nodes are connected with two links, £ 1; £ 2 , at most one of which may 
be activated at a time. If link l\ is activated, a packet can be successfully transmitted in one slot, but both links cannot be 
activated for the next 9 slots. If link £ 2 is activated, a transmitted packet is erased with probability .5 (and received successfully 
with probability .5) and both links can be activated in the next slot. 

The states for this system can be described by the elements {0, 1, 2, 9}, where state means that both links can be 
activated and state i > 1 means that no link can be activated for the next i slots. 

The control set for state is Cyo = {go, gi, .92} where go means no link activation, g\ means activation of link l\ and g 2 
means activation of link £ 2 . The control set for the rest of the states consist only of go- From state 0, if control go or g 2 is 
taken, the state returns to in the next slot, while if gi is taken the state becomes 9. From state i > 1 the system moves to 
state i — 1 in the next slot. 

At state 0, control go results in "inactive" channels. If control g 2 is taken, the result is either "unsuccessful" or "successful" 
transmission on channel £ 2 — a random event — and if control gi is taken, the result is "successful transmission" on channel l\. 
Here, a "successful" transmission should be taken to mean that a packet will be successfully delivered to node b if transmitted 
in the slot (in other words, a "good" underlying transmission link); it does not preclude the respective control to include a 
possible action that does not make a transmission in the slot at all. 
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The controls under which one of the links is activated are associated with two actions: a) the action of transmitting a packet 
on the corresponding link, if the queue is nonempty and b) the action of not transmitting a packet ("null" action). For the 
control that does not activate any link, the associated action set is only the "null" action. 

During system operation, there will be a number of packets at the queue of node a. The number of packets in the queue 
at time t is part of T-L t , not part of the system state. Based on H. t and ,s t , a policy takes control g £ Q St and then an action 
a £ A g . Depending on the result of the control, a transmitted packet (if any) may be successfully received, or erased. 

Departures. There are well-defined times when each arriving packet is considered to depart from the system. For example, 
in a store-and-forward communication network where a packet arrives at node i and must be delivered to a single node j, it 
is natural to consider the departure time as the time at which this packet is delivered to node j. Similarly, if the packet must 
be multicast to a subset JC of the nodes, the departure time of the packet can be defined as the first time at which all nodes 
in JC receive the packet. However, in some systems several definitions of departure times may make sense, and the particular 
choice depends on the performance measures of interest. As an example, consider the case where network coding is used to 
transmit encoded packets. In this case, a packet p arriving at a single-destination node j may be considered as departed when 
the destination node j can decode the packet based on the packets already received by that node. On the other hand, if the 
decoded packet is still needed for decoding of other packets, it may be of interest to define the departure time of p as the first 
time the packet is not needed for further decoding. At any time between the arrival and departure times of a packet p, we say 
that p is "in the system". 

There may be several restrictions on the policies in II. We assume that all policies in II have the following features. 
Features of Admissible Policies 
Fl) At time t, the history of the system up to t, Ti t is fully known. 

F2) At any time t at which there are packets only at the inputs of the system, it is permissible to take controls and actions 
taking into account only the packet at the inputs at time t, and to proceed without taking into account the rest of history 
Hf Formally, for any time t in which the internal (non-input) queues, if any, are empty, the set of controls and actions 
available to a policy may only depend on the current queue state and may not depend further on % t - 

F3) If at time t there are k packets at the inputs of the system, it is permissible to pick any m < k packets and continue 
processing the m packets, along with other packets that may be in the system, without taking into account the remaining 
k — m packets. Formally, the set of controls (and actions) available to a policy must be a superset of the set of controls 
(and actions, respectively) that would be available if k — m packets were removed from the input queues altogether, for 
any m < k. 

Features |F2] and |F3] may be natural for many systems, however, there are systems where they may not be available to the 
policies, as the following example shows. 

Example 3. Two-transmitter Aloha-type system. Consider a system consisting of two transmitters attempting to transmit arriving 
packets to a single destination. Each transmitter has its own queue. Activation of both transmitters in the same slot results in 
loss of any packet that may be transmitted. We can model this system by considering that it has a single state, and that the 
control set consists of pairs (51,52) where gi = 1 (<jj = 0) indicates that transmitter i 6 {1,2} becomes active (inactive). 

Consider the following classes of policies, III, II2: admissible policies ir of both classes have Feature IFTl Also, if only one 
transmitter queue, say transmitter 1 queue, has packets at time t, only the transmitter of this queue becomes active, that is the 
control (1,0) is chosen. However, the policies in the two classes differ when both transmitter queues are nonempty. In this 
case, policies in III are free to activate any of the transmitters. Under policies in TI2 on the other hand, the controls are chosen 
randomly, so that each transmitter becomes active with probability qt, < qt < 1 (and inactive with probability 1 — q t ), qt 
being the same for both transmitters. An action here consists of sending a packet if a transmitter is active. 

The policies in Hi have Feature lF3l while the policies in TI2 do not, since if, e.g. k = (1, 1), and control (1, 1) is selected, 
packets from both queues must be transmitted at the same time, i.e, a policy is not allowed to transmit first the vector (1,0) 
and next the vector (0, 1). Also, note that in both cases the policies trivially have feature lF2l 

Consider now a third class of policies, where policies act as policies in H2, with the following difference: a policy 
1 e II3 selects again a common packet transmission probability q when both queues are nonempty; however, after a given 
number k of times this probability has been selected, it must thereafter remain fixed and the policy is no longer permitted to 
change it. For this class of policies, Feature |F2] is not satisfied. 

At the beginning of slot let the system state be s and let there be ki > 0, i = 1, n packets at input i and no arrivals 
afterwards, i.e., Ai(0) = ki, Ai(t) = 0, t = 2,3,.... Let T^(k) > 0, k ^ be the time it takes until all of these packets 
depart from the system under policy tt. We call T^(k) the evacuation time under policy ir when the system starts in state s 
with k packets at the inputs, and denote its average value, (k) = E [T^(k)] , k ^ 0. It will also be convenient to define 
T£ (0) = 1, a convention that has the meaning of advancing one slot whenever the system is empty. 

Let 

T;(k) = inf f;(fe) O) 
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and 

?*(*;)= max T s *(fc). 

We call T*(k) the critical evacuation time function. It will be seen that under certain statistical assumptions, this function 
determines the stability region of the policies under consideration. 

Note that according to the definition of T*(k), for any e > we can always find a policy it such that 

T?(k) <T s *(fc) + e. (4) 

This fact will be used repeatedly in the development that follows. 

Next, we present statistical assumptions regarding the system under consideration. 

Statistical Assumptions 
SA1) For all k 

T*(k) < oo. 

SA2) System and arrival process statistics are known to a policy. 

SA3) Markings (such as service times, permissible routing paths, etc) associated with packets arriving to a given input are 

independent and statistically identical. Markings across inputs are independent. 
SA4) If at the beginning of a slot t the system state is s t £ S and control g t £ Q St is taken, the results at time t + 1 are 

independent of the system history before t. However, the system state s t+1 and the results at time i + 1 may depend on 

both St and g t . Hence the system states may be affected by the controls (but not actions) taken by a policy. Formally, 

if Wt is the (random) outcome at the end of a slot, we have for all t, 

Pr(Wt+i,S t +i \st,gt,Ut) = Pr (Wt+i,S t +i \s t ,gt) ■ 

SA5) At time t = 0, 1, 2, . . . let there be k packets in the system (where ki is the number of packets still in the system that 
originally arrived at input i; they may or may not still be at the input queues). There is a policy iih which can process 
all these packets until they all depart from the system by time t + F^ h (k) (F^ h (k) may be random), such that 

n 

E[F**(k)] ^d^fci + Co, (5) 

where C\, Cq are finite constants (which may depend on system statistics but not on k). 
SA6) Let Bi be the unit n-dimensional vector with 1 at the z-th coordinate and elsewhere. It holds for all i = l,..,n, and 
all k and s, 

f; (fe) - f: (fc + eO < D < oo. (6) 

Statistical Assumption ISA5I is easy to verify in several systems. For example, in a communication network a policy that 
usually satisfies this assumption is the one that picks one of the k packets, transmits it to its destination, then picks another 
packet and so on, until all the packets are delivered to their destinations. Note that assumption ISA5I implies ISA1I we keep 
assumption IS A 1 1 separate because, as will be seen shortly, only this assumption is needed to establish the key property (namely, 
subadditivity) of T* (fc) . 

Statistical Assumption IS A6I is needed to justify a technical condition in the development that follows. This assumption may 
also be easy to verify for several systems. It says that, if the number of packets at the system inputs at time is increased by 
one, then the minimal average evacuation time under any initial state cannot be decreased by more than a fixed amount. For 
example, this assumption is always satisfied if T* (fc) is non-decreasing in fc, i.e., 

r;(fc) <T s *(fc + ei ). (7) 

In particular, it can be easily shown that condition (0 holds if policies have the ability to generate "dummy" packets (i.e. 
packets that bear no information and are used just for policy implementation) during their operation, a feature that is available 
in many communication networks. Indeed, assume that at time t = the system is in state s and there are fc packets at the 
system inputs. Pick e > and a policy it such that 

f; (fc + a) < t s * (fc + eO + e. 

Consider the following policy ttq for evacuating fc packets: generate a "dummy" packet for input i, place the fc + e; packets 
at the inputs and use policy ir to evacuate the system. By construction, T*° (fc) < (fc + e;) (the inequality may be strict 
if the departure time of the dummy packet turns out to be strictly larger that the departure times of the rest of the packets). 
Hence, 

f; (fc) < ?;° (fc) 

< (fc + eO 

< t; (fc + m) + e. 
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Since e is arbitrary, (0 follows. 

To conclude the discussion of Assumption ISA6I we provide an example for which Assumption ISA6I holds, even though 
T* (fc) may decrease as fc increases. This example is inspired by JTJ. 

Example 4. Consider a system with two inputs. If packets from both inputs are processed simultaneously, then both depart 
after a slots. If a single packet from any of the inputs is processed, then this packet departs in A slots, where A > a. Admissible 
policies may select to transmit pairs of packets (one from each queue) or single packets. It is easily seen that 

T* (fci, fe) = amin {fci, ^2} + A \k\ — k 2 \ ■ 

Hence, for any fc, f* (fc, k + 1) = ak + A and f*(k + l,k + l) = a (k + 1) < f* (k, k + 1) . On the other hand, we always 
have, 

T* (fci, £2) — T* (fci + 1, fe) = amin {fci, fc2} + A \ki — k 2 \ — a min {fci + 1, k%\ — A \k\ + 1 — k 2 \ 

< A. 

III. Properties of Critical Evacuation Time Function 
The following property of the critical evacuation time function will play a key role in the following. 

Lemma 5. The Critical Evacuation Time Function is subadditive, i.e., the following holds for m > 0, k > 

f*(k + m) <f*(k) +f*(m) (8) 

Proof: Let e > and let the system be in state s at time 0. An admissible policy tt that evacuates k + m packets is the 
following. 

a) Pick an admissible policy TTk such that, 

f^(k) <f s *(k) + e/2. 

b) Evacuate the k packets following policy 7Tfc. According to Feature |F31 this is permissible. From Statistical Assumption 
ISA4I we conclude that the average evacuation time in this case is TJ rfc (fc) . Let si be the state of the system by time TJ rfc (fc). 
Both si and T^ k (k) are known to ttj^ (hence to tt), due to Feature [FT] Note that si is a random variable that depends on s. 

c) Again, pick an admissible policy n m such that, 

T--(m)<T*(m) + e/2, 

According to Feature IF2I this choice of 7r m is permissible. 

d) Evacuate the m packets following policy n m . Due to Statistical Assumption IS A3 1 and ISA4I the average evacuation time 
(given s\) in this case is T^ m (m). 

The average evacuation time of n is 

r; (fc + m) = f^(k) + e [r; r (m)] 

<r;(fc)+E[r*(m)]+ e) (9) 

where the expectation in (O is with respect to random variable s±. Hence, 

T*{k + m) = maxf s *(fe + m) 

< max Tg (k + m) according to ([3]) 

< max (r;(fe) + E [T* (m)] } + e according to © 

< maxf *(fe) + maxf*(m) + e since f si < maxT s 

Since e is arbitrary, the lemma follows. ■ 
Let No and Ko be respectively the set of nonnegative integers and nonnegative real numbers. We extend the domain of 
definition of f* (fc) from to R l as follows. For relj, let 

T*(r)=T* ([>"]). (10) 

The function T* (r) is not necessarily subadditive in Rq, since, in general, subadditivity at integer points does not imply 
subadditivity over Ro- For example, the function / (21) = al and / (21 + 1) = al + A, I = 0, 1, with a < A, is subadditive 
in No, while for n = r 2 = 1.5, / (\r\ + r 2 ~|) = /(3) = a + A and / (\n]) + /(|>2~|) = 2a < /(fn + r 2 ~|) . However, as 
the next Lemma shows, T* (r) possesses the basic property of subadditive functions, namely the asymptotically linear rate of 
growth. 
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Theorem 6. For any r G Wq, the limit function 



T* (tr) 

T(r)= l im _LJ j (H) 

t— >oo t 



exists and is finite, positively homogeneous, convex and Lipschitz continuous, i.e., it holds 

n 

f(r)-f(s) <DY,\ 



?:=i 



where D is a positive constant. Moreover, for any sequence rt G Rq such that 

lim r t = A < oo, 

t— >oo 

it holds 

lim =f (A). (12) 

i->oo t 

Here, "positively homogeneous" means that for any /? > 0, 

f{pr)=pf{r). (13) 

The proof of Theorem [6] is given in the Appendix. 

IV. Stability - Necessity 

Let ^t), t > 1, be the number of packet arrivals at input i that have departed from the system during time slot t under 
policy 7r 6 LI when the system starts in state s. Define also { (0) = 0. In the following we will use the notation 

t t 
M (t) = M (r) , (t) = W - 

t=0 r=0 

to denote the cumulative number or arrivals and departures respectively up to time t. Hence the number of packet arrivals at 
input i that are still in the system at time t is (t) = A4 (t) — i (t) (these packets may at time t be scattered among 
internal system queues as well as the original input queue). We define the vector Q 1 ^ (t) = (Q^ i (t)) and the total system 
occupancy 

n 

q: = (*)■ 

1=1 

Let M. be a probability measure over the space of permissible arrival processes; in other words, A4 captures the statistical 
assumptions about the arrival processes, such as the distribution of the arrival sizes, whether or not the arrivals are independent 
over time and between queues, etc. Let A4\ be a probability measure over arrival processes that satisfy ergodicity conditions 
([I]!-© with a rate vector A. 

Definition 7. System Stability. A policy tt G n is called stable for an arrival rate vector A > 0, if under any initial system 
state s, the following holds: 

lim limsupPr(QJ (t) > q) = (14) 

(where the probability in (TT~4T > is taken with respect to the arrival process statistics M.\, as well as the system internal state 
transitions). 

The stability region IZ^ of a policy tt (under M.) is the closure of the set of the arrival rate vectors for which the policy is 
stable. The stability region TZ of the system is the closure of the union of TV , n G n. @ 

We show in Theorem [9] below that under ([TJ and ©, it holds 1Z C |r > : T (r) < 1 j . Furthermore, in section W\ we 

show that under the assumption that the packet arrival vectors are i.i.d. over time, we also have |r > : T (r) < lj C 1Z, 

hence, 1Z = jr > : f(r)< lj, and we show an explicit policy called "Epoch-based" that is stabilizing. 
For the proof of Theorem [9] we need the following lemma. 

2 We emphasize that the stability region of a policy may in general depend on the permitted statistical assumptions about the arrival processes; for example, 
a policy may be unstable for a certain rate vector A if general stationary arrival processes are allowed, but become stable if the individual queue arrivals 
are required to be independent. The above definition of stability is generic and captures a number of common definitions of stability in the literature, and 
the subsequent discussion in this section is orthogonal to any specific assumptions imposed on the arrival process, beyond the basic ergodicity condition of 

GJ-0- 



7 



Lemma 8. // (O, 0. tfzen 



t— »oo f 

Proof: It follows from (|T), (|2]i and the corollary to Theorem 16.14 in \5\ that the sequences |JU (t) /t| 
uniformly integrable, hence the sequence |X)"=i -^i (*) A} ^ s a ^ so uniformly integrable. Since 



(15) 
1 , . . , n are 



< 



r s {t) < Etx^W 



i - i 

we conclude that the sequence {QJ (t) /£} is also uniformly integrable. We will show in the next paragraph that {Q1 (t) /t} 
converges in probability to 0. Equality (fT3T > will then follow from Theorem 25.12 in |5)- 

Pick any e > (arbitrarily small) and a q > large enough so that according to ( TPfl i it holds, 

limsupPr{QJ (t) > q} < e 
Since we can pick to large enough so that et > q, t > to, we have 

Q* (*) 



lim sup Pr 

t— >oo 



i.e., {Q1 (t) /<} converges in probability to 0. 
Theorem 9. Let (0, © hold. IfreTi then, 



> e } < limsupPr{Q^ (i) > q] 

< e 



T(r) < 1. 



Proof: Pick r <E 7Z. Since r belongs to the closure of the rates for which the system is stabilizable, for any S > we can 
find a A > 0, ||A — r|| < 5, for which the system is stable under some policy ttq G IL By continuity of T (r) it suffices to 
show that for any such A, 

f(A)<l. (16) 

Let the initial system state be ,s G S. Fix an arbitrary time index t and generate random number of packets A(0), A(t) 
according to the distribution of the arrival processes. Consider that all A (t) = X^t=o ^ (*) P ac k ets are in the system at the 
beginning of time and construct the following evacuation policy tt. 

1. Mimic the actions of policy ttq for up to t time slots, assuming that the packet arrivalprocess at time r is A (r) ,r = 1, t. 
Due to Statistical Assumption ISA2I and Features [FT] |F3] this mimicking is permissiblejj 

2. If all A (t) packets are transmitted by time t then the evacuation time of ir is at most t. Else, after t time slots there 
will be Ql° (t) > packets in the system. According to Statistical Assumption ISA5I pick a policy ir/-,, to evacuate the Q v s ° (t) 
packets in F* h (Q*° (t)) slots, where 



E 



^(Q: (f)) A(t) t iC(t)\ <dQ^(t) + Co, by©. (17) 

"at most", because all A (t) packets may have left 



The evacuation time of n given A (t) is at most t + F^ h (Q^° (t)) 
before time t — and hence, taking the conditional average, we have 



f;(A(t) 



<t + E 
= t + E 
< t + Ci 



F**(Q?(t)) A(t) 

'F^(Q^(t))\A(t),D7 (t) 

Q:°(t) A(t)] +c b y m 



A(t) 



Next, using the last inequality, 



f(A(t))=^{2?(A(t))} 



max • 

ses 

< t + dmaxE \q*° (t) A(t) 
ses L 



< t 



ses 



[Q:°(t)\i(t) 



KC 
C Q . 



3 We remark that the theorem continues to hold even if anticipative policies are allowed, i.e., if Feature IFTI is revised so that the information available to a 
policy includes not just the past history up to time t, but future packet arrivals as well. If no is anticipative, one can accordingly generate random variables 
A (t) , t = t + 1, . . . so that 7r can mimic ttq taking into account the future arrivals; the rest of the proof then remains unchanged. 
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Taking expectations with respect to A(t) and dividing by i, we have 



E 



T 



(¥*) 



E[QJ°(t)] , Co 



Since 



using ( fT2b from Theorem [6] we then obtain, 



Hence, 



lim ^P- = A, by ©, 



T' 



lim 

t— s-oo £ 



(¥0 



= T(A) 



f (A) 




by Fatou's lemma 



1 + d V lim 
lbydHJ 



E[QJ°(i)l ,. C 0l 

t— >oo t t—>oo f, 



(18) 



We note that there are classes of policies for which the limit f (A) can be formally defined, but Theorem [9] does not hold 
in all its generality since some of the Features of admissible policies in Section [TT] are not satisfied. The next example shows 
the case where Feature |F3] is not satisfied. 

Example 10. Consider the following system. There are two inputs. Policies may decide to process no packets in a slot, 
otherwise processing of packets must obey the following rule. If only one of the inputs has packets a single packet from the 
nonempty input is processed in 1 time slot. If on the other hand both queues are nonempty, then pairs of packets from both 
queues must be processed in 3 time slots. This system is a simplified version of the system in Example [3] and the specified 
policies do not satisfy Feature |F3] It can be easily seen that 

T* (ki,k 2 ) = 3min{fci, k 2 } + |fei - fe| , 

hence formally, 

T(ri,r 2 ) = 3min{ri,r 2 } + \n - r 2 \ ■ 

The region T (ri,r 2 ) < 1 is described by 

{r > : n + 2r 2 < 1, and r x > r 2 } U {r > : 2n + r 2 < 1, and r 2 > n} (19) 

Clearly, the vector (1/2, l/2)does not belong in this region. Consider, however that 1 packet arrives in even slots to input 
1 and 1 packet in odd slots to input 2, hence the arrival rate vector is (1/2,1/2). Then simply processing immediately the 
arriving packets results in a stable policy. 

Notice also that the region in ( fT~9b is not convex, while the region in Theorem [9] is convex since T (n, r 2 ) is convex. 

The arrival processes in the previous example are not stationary, hence one may wonder whether imposing slightly stronger 
assumptions on the arrival processes would render the claim of Theorem [9] valid. An example is presented below, where the 
arrival processes are i.i.d. but Theorem [9] still does not hold since admissible policies do not satisfy Feature |F3| 

Example 11. Let M > 1 and consider a system with a single input and the following restriction on the policies. If the number 
of packets in the inputs is 

k = IM + v, < v < M — 1, 
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then a policy may either decide to idle in a slot or to transmit m packets, \ < m < M + v m which case it takes I slots to 
process all m packets. Under this restriction we have 



r*(fc) = E 



\ 
»=i 



hence, 

t— ¥00 t 

_ Um 1 ( (M-v t )(M-v t + l) 
= 00. 

Applying formally Theorem [9] we deduce that the system is unstable for any positive arrival rate. Consider, however, that the 
arrival process is i.i.d but bounded, such that at most 2M — 1 packets may arrive at the beginning of each slot (including slot 

0, i.e. to be in the system when it commences operation). Then the policy that transmits all the packets immediately is stable, 

1. e., under the stated conditions on arrival process statistics, the system is stable for any arrival rate A < 2M — 1. 

For the systems described in the last two examples, there were rates outside the region obtained by using formally T (r) , 
for which the systems were stabilizable. The next example shows an opposite case, namely where the system is unstable for 
rates inside the formally obtained region (again, due to not satisfying Feature IF3b . 

Example 12. System with priorities and switchover times. Consider a single server with two inputs, where arrivals at input 
1 have priority over arrivals at input 2. If there are packets from input 1 in the system, one of these packets must be served. 
Packets from input 2 may be delayed by a policy. Packets are of length 1 slot. There is a preparatory time of 1 slot to set the 
system to serve packets from a given input. Hence, when the system changes from serving packets of one input to serving 
packets of the other input, there is an idle slot. The system may start by having the server ready to serve one of the two inputs. 

The system has two states, si, s 2i where state Si means that the server is set to serve packets of input i. For this system, 
we have / 

fci + 1 + k 2 if k 2 ^ 
fci if k 2 = 



T* t (fci,fc 2 ) 

and 



1 + h + 1 + fc 2 if fci ^ 0, k 2 ^ 
{ 1 + ki if k 2 = 

fc 2 if fci = 

Hence, 

T{ri,r 2 ) =ri+r 2 

and the region obtained formally is 

{r > : n + r 2 < 1} . 

Consider, however an arrival pattern where the system starts at state Si, and a single packet arrives at input 1 at every 
t = 4k, k = 0, 1, hence Ai = .25. Packets at input 2 arrive according to an i.i.d process of rate A2 > .5. It can be easily 
checked that in any interval [4fc, 4fc + 8), the number of packets served from input 2 cannot be larger than 4, hence the 
departure rate for packet at input 2 cannot be more than .5 and the system is unstable, even though Ai + A2 < 1. 

One may wonder whether if the initial state of the system at time t — is fixed, say s(0) = sq, then stability is determined 
by T* (fc) only. The following final example illustrates that this is not always the case, i.e. the condition of theorem [9] applies 
to the critical (worst-case) evacuation time function, and not just the evacuation time function corresponding to sq. 

Example 13. Consider a system with two servers, where server 1 takes I slots to serve a packet, and server 2 takes L > I 
slots. The system can be in one of three states, (0,0), (1,0), (0,1), where denotes an inactive and 1 denotes an active server. 
Suppose that there are no (or null) controls, and that state transitions are random with the following transition probabilities. 

Pr {(1,0) |(0,0)} = Pr {(0,1) |(0, 0)} = Pr {(0,0) |(0, 0)}=i Pr {(1, 0) |(1, 0) } = Pr {(0, 1) |(0, 1) } = 1. 
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If the system starts at state (0,0), it takes on average 1.5 slots to move to one of the other states, and the transition to either 
state occurs with equal probability. Then, since no further change of states occurs afterwards, it will take either lk or Lk slots 
to evacuate k packets. Hence, 

7(0,0) (*) = §+ 

It can also be easily verified that 

7(i,o) W = 
7(o,i) (k) = Lk 

f* (k) = max | ^ + fc, Zfc, ifc 

and T (r) = Lr, which results in the stability condition, A < 4. 

Assume now that the system starts in state s = (0, 0) (an initial state that may be "natural" in some sense), and formally 
use T^j ) (k) in place of T* (k) . Then, we would conclude that T (r) = l -^-r and hence that the system is stable when 

9 

A < 



hence, 



l + L 

This, however, is wrong since for — ^- > A > j-, under state transition (0,0) — > (0,1), an event of positive probability, the 
input rate will be larger than the output rate. 



V. Epoch Based Policy - Sufficiency 

In this section, we consider a specific policy which we henceforth refer to as an Epoch-Based policy. The idea of the policy 
(which is defined formally below) is to divide the time into 'epochs' and focus on the efficient evacuation of packets present 
in the system at the start of an epoch, while ignoring any new packets that arrive during the epoch. The main result of this 
section is that, for the special case of independent and identically distributed (i.i.d) arrival processes, the epoch-based policy is 
throughput-optimal, provided that the underlying evacuation policy within each epoch is efficient (i.e., informally, minimizes 
the expected evacuation time for the packets present at the start of the epoch). More precisely, in this section we make the 
assumption that the arrival process vectors A(t) are i.i.d with respect to time for t = 1,2,... (for a given time slot t, the 
components of the vector A(t) may be dependent; also, the initial number of packets in the system at t = 0, namely A(0), 
can be arbitrary and is not required to have the same distribution as for t > 1). We then show that the epoch-based policy is 
stabilizing for any such arrival processes if the arrival rate A satisfies T (A) < 1. 

Consider the set 

Til = | A > : T(A) < lj . 

This set is nonempty, since 

T* (t ■ 0) 

T(0) = lim { - '-=0, (20) 

t— >oo t 

hence £ K|. We will construct a policy that is stable for any A £ K|. The continuity, convexity of T (A) and ( 1201 imply 
that the closure of TZi is the set | A > : T (A) < 1 j and hence, 

|a > :f(A) < lj CTZ. 
Combined with the necessity result of Section HVl we then conclude that 

K= [A > : T(A) < lj . 
We now present a policy that stabilizes the system for any AeKj, that is, 

f(A)<l. (21) 
A version of this policy was used in [6 1 to provide a stabilizing policy for a two-user broadcast erasure channel with feedback. 
Definition 14. Epoch-Based Policy n e : Pick e > such that 

< e < 1 -f (A), 



and for each k and s, pick an evacuation policy irk, s such that 

Ts k,s (fe) < r; (fc) + e . 



(22) 
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Policy 7r e operates recursively in (random) time intervals [t m -i,t m ), m = 1, called "epochs", as follows. Epoch 1 starts 
at time to = at state So = sq with A (0) = A (0) = fc packets at the inputs; policy 7Tfc ;S is used to evacuate the fc packets 
by time t x = Ts"' a (fc) , while any new packet arrivals during the epoch are kept at the inputs, but excluded from processing. 
Let S m be the state of the system at time t m . Epoch m + 1, m > 1 starts at time t m with k m = A (t m ) — A (i m _i) packets 
at the inputs and policy Hk m ,s m is used to evacuate the k m packets by time t m +±. 

Let T m = t m — t m -i,m = 1,2,... be the length of the rn-th epoch. Since the arrival process vector is i.i.d, if policies satisfy 
the Basic Features and the Statistical Assumptions of Section [TTJ the process {(T m , S m )}°°_-, constitutes a (homogeneous) 
Markov chain with stationary transition probabilities. Note that with this formulation, the initial state of the Markov chain, 
(Tx, Si) , is a random variable whose distribution depends on A (0) and so- 

The main result of this section is the following. 

Theorem 15. For any A > such that 

r(A)<l, (23) 

policy 7r e stabilizes the system. 

The proof of this theorem is given in the Appendix. 
Remarks 

1) The epoch-based policy is non-anticipative (it does not require knowledge of future packet arrivals), but is sufficient to 
attain the stability region even if anticipative policies are allowed, as explained in the footnote in the proof of Theorem [9] 
Thus, the ability to anticipate future packet arrivals is not required for throughput optimality. 

2) Note that stability depends solely on the fact that inequality (1221 holds for large enough |fc|. Hence, for the epoch based 
policy to be stable, it is sufficient for policies irk, s to satisfy (l22l only for large enough |fc|. In other words, asymptotically 
optimal evacuation policies can be used to construct stabilizing epoch based policies. 

3) The requirement for the arrival process to be i.i.d. only applies for t > 1; the initial queue lengths, namely A(0), may 
have any distribution that is not necessarily the same as for A(t),t = 1,2,.... By induction, it is easy to extend the 
"exemption" up to any finite t and only require the arrival process to be i.i.d. for t > t . 

4) Similarly, the proof can be easily extended to the case where the arrival process is not i.i.d for individual time slots but 
is "block-i.i.d" with block length D; in other words, where the vectors (A(i ■ D + 1), . . . , A ((i + 1) • D)) are i.i.d with 
respect to i for i = 1,2,... (but arrivals may be interdependent within a "time block" iD + 1 < t < (i + l)D). This is 
achieved via time scaling by a factor of D, namely enforcing epoch durations to be multiples of D (by simply requiring 
the epoch-based policy to wait until the next multiple of D after all packets from the start of the epoch are evacuated), 
which allows the Markovian nature of the system to be maintained. 

We conjecture that the epoch-based policy can be shown to be stabilizing for any general stationary and ergodic arrival 
process, but the necessary extension of the proof remains open at this stage. 

5) A policy, which seems to be more amenable to analysis under stationary and ergodic arrivals is a frame-based policy 
which operates in periods. At each period n, beginning at time S n , a number of packets are processed. The packets under 
processing in period n have all arrived in the system before S n and correspond to a frame of arrivals of fixed duration 
F. In particular, during the n th period, only arrivals from the frame [/„_!,/„) are processed, where I n — I n -i = F, 
hence = nF. The time to evacuate all the arrivals in the interval [I n -i, In) is random, depending on the number of 
arrivals as well as other random events and we denote it with T n (F). Note that if there is only one system state, then 
if the arrival process is stationary, T n (F) is a stationary process as well. 

Before the start of period n + 1, a waiting time is added if S n + T n (F) < I n + F. This waiting is imposed in order to 
ensure that I n +\ — In = F. By letting D n = S n — I n denote the lag process, it can be seen that 

£>„+! = (D n + T n (F)-F) + . 

Note that this equation is of the same form as the recursion relating the queue size in a discrete G/G/l queue with "arrival 
rate" (per slot) T(F) and "service rate" F. Note that if T(A) < 1, then by picking F large enough, we can ensure that 
T(F) < F i.e., "arrival rate" is less than the "service rate". We conjecture that this policy stabilizes the queue sizes 
under stationary and ergodic arrivals. However the policy is unattractive in practice since it induces very large delays 
even for small arrival rates. 

VI. Application: Capacity and Stability regions of Broadcast Erasure Channel with Feedback 

Consider a communication system consisting of a single transmitter and a set Af = {1,2, ...,n} of receivers/users (we 
hereafter use these two terms interchangeably). The transmitter has n infinite queues where packets destined to each of the 
receivers are stored. Packets consist of L bits and are transmitted within one slot. The channel is modeled as memoryless 
broadcast erasure (BE), so that each broadcast packet is either received unaltered by a user or is "erased" (i.e. the user does 
not receive the packet, but knows that a packet was sent). The latter case is equivalent to considering that the user receives the 
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special symbol E, which is distinct from any other possible transmitted symbol and does not map to a physical packet (since 
it models an erasure). To complete the description of the system we also need to specify the outputs when no packet is sent 
by the transmitter, i.e., the slot is empty: in this case we assume that all receivers realize that the slot is empty. An empty slot 
will be denoted by 0. Equivalently, we may view "no transmission" as transmission of a special symbol 0. 

In information-theoretic terms, the broadcast erasure channel under consideration is described by the tuple (X, (3^ <E 
M), p(Yi\Xi)), where X is the input symbol alphabet, y. t = y = X U {E} is the output symbol alphabet for user i, and 
p(Yi\Xi) is the probability of having, at slot I, output Yi = (Yij,i G AT) for a broadcast input symbol X;. The memoryless 
property implies that p(Yi\Xi) is independent of I, so that it is simply written as p(Y\X). We denote by e^/ E , Me Q M, 
the (common) probability that a transmitted packet (i.e. a symbol in X — {0}) is erased by all users in the set Me- To avoid 
unnecessary complications we assume in the following that en\ < 1 for all i. Note that for the empty slot (symbol 0) we 
have Pr(Y = (0, ...0) \X = 0) = 1. 

We assume that there is feedback from the users to the transmitter, so that at the end of each slot I, all users inform the 
transmitter whether the symbol was received or not (essentially, a simple ACK/NACK) through an error-free zero-delay control 
channel. 

We define two regions for this channel, the information theoretic capacity region, and the stability region. The information 
theoretic capacity region describes transmission rates under which it is possible to transmit sets of messages (one for each 
user) placed at the transmitter by using proper encoding, so that all users receive the messages destined to them with arbitrarily 
small probability of error. For the stability region, Definition Q is used, under the assumption that packets arrive randomly to 
the system. We assume that packets are transmitted using a proper encoding, such that they are decoded by the receivers with 
zero probability of error. 

We now give a precise definition of the two regions, and show in the following that they are identical. 

Information theoretic capacity region 

A channel code, denoted as c; = (Mi, . . . , M n , I), for the broadcast channel with feedback is defined as the aggregate of 
the following components (this is an extension of the standard capacity definition of Q to n users): 

• Message sets Wi of size \Wi\ = Mi for each user i <E M, where • denotes set cardinality. Denote the message that needs 
to be communicated as W = (Wi, i € M) € W, where W = W\ x . . . x W n . For our purposes it is helpful to interpret 
the message set Wi as follows: assume that user i needs to decode a given set /Q of i-bit packets. Then, W, is the set 
of all possible L bit sequences, so that it holds \Wi\ = Mi = 2^^ L . Henceforth we will assume this relation. 

• An encoder that transmits, at slot t, a symbol X t = ft(W, Y ), based on the value of W and all previously gathered 
feedback Y = (Yi, ■ - ■ , Y t _i), Y^ = (Y\ f., Y n k). X\ is a function of W only. A total of I symbols are transmitted 
for message W '. 

> n decoders, one for each user i e M, represented by the decoding functions §i : y l — >• Wi that map Yf, where 
Y\ = (Yi ; i, . . . , Yi i) is the sequence of symbols received by user i during the I slots, to a message in Wi- 
In the following we write (Mi, . . . , M„, I) to denote the code ci, with the understanding that the full specification requires all 
the components described above. The probability of erroneous decoding is defined as qf = Pr(Ui<z_\r {gi(Yl) ^ W^}), where it 
is assumed that the messages are selected according to the uniform distribution from W. The rate R for this code, measured 
in information bits per transmitted symbol, is now defined as the vector R = (Ri : i e M) with R{ = (log 2 Mi) /I. Hence, 
it holds Ri = \K,i \ L/l = r.iL, where r.i = |/Q| jl is the rate of the code in packets per slot, and the bits of each packet are 
uniformly distributed and independent of the bits of the other packets. For our purposes, it will be convenient to define the 
capacity region of the system in terms of the rate vector r = R/L. 

A vector rate r — (ri,...,r n ) is achievable if there exists a sequence 

{ c i}z=i of codes (2^ L , . . . , 2l 7r «l L , I) such that 
qf — >■ as I — > oo. The capacity region C of the system is the closure of the set of achievable rates. 

Stochastic Arrivals: Definitions of admissible policies 

As in Section |Il] we assume that packets arrive randomly to the system according to the stochastic process A (t) and are stored 
in infinite buffers at the transmitter. We denote by A.(t) the content of these messages, i.e., A.(t) = (A-i (t) , A, n (t)) where 
Ai (t) = {pi,i (t) , ■■■,Pi.A i (t) (tj) , and pi.j (t) denotes the sequences of bits corresponding to the jth packet with destination 
node i that arrived at the transmitter at time t — if no packets arrive we consider that A4 (t) is the empty set. We assume 
that Pi j (t) are uniformly distributed and mutually independent. We denote A. = (.4.(0) , . . . , A.(t)) to be the contents of all 
packet arrivals up to time t. 

An admissible policy consists of 

• An encoder that transmits, at slot t, a symbol X t = ft(Y based on all previously gathered feedback, Y = 
(Y\, . . . , Yt-i), and the contents of packet arrivals up to time t, A. . 
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« n decoders, one for each user i G Af, represented by the decoding set- valued functions <?i. t (Y?) that at time t maps Y? 
to a subset of the packets that have arrived up to time f — 1 with destination node i, i.e., 

T>t C { Piij (r) : r < t - 1, 1 < j < At (*)} . 

A packet is decoded the first time it is included in V t . We set the requirement that packet decoding is correct with probability 
one. Note that there is at least one policy that satisfies this requirement: this is the time-sharing policy where packets destined 
to destination i are (re)transmitted, in First-Come-First-Served order, until successful reception in slots specifically assigned 

to i, say slots jn + i — 1 , j = 0, 1, We call such a policy One By One (OBO) policy, tto- For this policy it can be easily 

seen that 

n 
i=l 

where Ci depends on the erasure probabilities, but not on k. Hence, no satisfies (|5j. 

In order to apply the stability definition Q to the class of policies specified above, we must define the time instant at which 
a packet leaves the system. There are two ways to define this instant. According to the first, a packet is considered to leave the 
system when it is correctly decoded by the destination receiver. While this definition make sense if one is interested in packet 
delivery times, it does not capture the fact that a decoded packet may still be needed for further encoding and decoding, in 
which case the packet will keep occupying buffer space even after its correct decoding. Also, the feedback information may 
need to be stored in the buffers of the transmitter if needed for further encoding. To capture buffer requirements we assume 
that each of the receivers has infinite buffers where received packets are stored. We next introduce a second definition of queue 
size, where we take into account the following. 

1) Each transmission results in storing at most n packets, one at each receiver. These packets may be functions of "native" 
packets that have arrived exogenously at the transmitter, as well as the feedback received at the transmitter. Hence, in 
this case packets may be generated internally to the system during its operation. 

2) A packet stored at a receiver buffer departs when it is not needed for further decoding. 

3) A feedback packet is stored at the transmitter until it is not used for further encoding. 

4) A native packet departs from the receiver if a) it has been decoded by the receiver to which it is destined and b) it is 
not used for further encoding. 

If QJ) (t) and Q B (t) respectively are the sum of queue sizes under n according to the previous two definitions of packet 
departure time (Delay, Buffer), it holds, Q* D (t) < Q B (t) . Hence, if So and Sb are respectively the stability regions according 
to the two definitions, it holds 

S B ^S D . (24) 

Relation between Capacity and Stability Regions 

The distributed nature of the channel introduces some new issues that must be addressed in order to apply the results of the 
previous sections. Specifically, while the transmitter has full knowledge of the system through the channel feedback, this is 
not the case for the receivers. Transferring appropriate information to the receivers takes extra slots which must be accounted 
for. 

Note first that there are some differences in the information available at the receivers in the definition of the two regions 
given above. Specifically, in the capacity region definition, it is assumed that the receivers know the number of packets at the 
transmitter when the algorithm starts. On the other hand, when arrivals are stochastic, this information cannot be assumed a 
priori and if needed it must be communicated to the receivers. Also, in the capacity definition, all receivers under any admissible 
coding know implicitly the instant t at which the decoding process stops. For the stochastic arrival model, however, under 
a general evacuation policy, this may not be the case. Note that the One-By-One policy no does not need the information 
regarding the number of packets at the transmitter when the system starts. Also, an evacuation policy that is based on no 
can be easily modified to inform the receivers about the end of the decoding process: when all packets to destination i are 
transmitted, an empty slot is transmitted in the next slot allocated to i, informing all receivers of this event. Hence if the last 
packet is delivered to the appropriate destination at time t, all receivers will know at time t + 1 that all packets are evacuated. 
Note that (0 still holds under this modification. We denote this modified policy as ir . 

Since it can be preagreed which evacuation policy to employ when a given number of packets k is initially at the transmitter, 
once that number is known by all receivers, the employed evacuation policy is also known by the receivers. 

In the following, we initially assume the following conditions (these conditions will be removed later). 

• When an evacuation policy starts, the number of packets at the transmitter is known to the receivers. 

• An evacuation policy ensures that all receivers realize the end of the evacuation process at some time i, which is defined 
as the end of the evacuation process. 

Under these conditions, the arguments of Lemma [5] apply and hence T*(k) is again subadditive (we omit the subscript 
describing states since the system under discussion has just a single state). Also, the arguments for © still hold (note that by 
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placing the "dummy" packet in the argument last in the transmitter queue corresponding to receiver i, this receiver knows that 
this packet contains no information and hence decoding error does not occur). Hence Theorem [6] holds for the current model. 
We now claim that under the above stated assumptions, 



Lemma 16. It holds, 



C = K= |r > :f(r) < lj 



Proof: We first show that 1Z C C. For this, it suffices to show that if for some r it holds T (r) < 1, then there is a 
sequence of codes cj = (2^ L , . . . , 2^ lr ^ L , I) with qf -^oo 0. Select 5 > such that 

f(r) + 3<y<l. (25) 

In the following, we denote, for any positive integers I and Iq, ai = ^-J and /3/ = I mod l , i.e. 

I = aik+Pi, 0<Pi<lo 

It follows that 

{In] < (a, + 1) \i n] . (26) 

Select and fix Iq large enough so that 

g^M)<f(r) + i. (27) 

Select an evacuation policy 7T; such that 

f„ lo (\l r])<f*(\l Q r]) + l Q 6, (28) 
Consider the following sequence of codes q for transmitting \lr \ packets. 

a) Use 7T; to transmit successively ol{ + 1 batches of [Zo'*l packets (the last batch may contain dummy packets) until they 
are decoded by all receivers. Let T£ ( |^o r l ) be the (random) time it takes to transmit the j-th batch, and 

'0 

n lo (\iorv= a f:n lo (\i r]) 

b) If 

fi lo (\l r])<l 

all packets are correctly decoded; else declare an error. 

The probability of error for the sequence q is computed as follows. Observing that 

lim ai = oo, lim — = 0, 
and taking into account d25l l. pick I large enough so that for alH > I it holds, 



Oil 



m + 1 V IqOli) \ ai + lj \ l ai J 

Pi 1 ( Pi 

loai ai + 1 \ hai 

>i-J- r fi + A 



>T(r) + 36 (29) 
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Then, 



gf = Pr{fi lo (r/ rl)>/} 



orj+l 



= Pr^ £ I* o (|Vl )><*,*«,+ A 

3=1 



= Pr 



< Pr 



< Pr 



< Pr 



< Pr 



V a 1+ 1 ^, n (rtorl) 
^3 = 1 to 

a/ + 1 

V a,+i ^ tt cr^i) 

^3 = 1 to 
a; + 1 



> 



Oil 



ai + 1 



>f(r) + 36 



hai) 



a; + 1 



, Tl (\1 t\) 



a; + 1 



Oil + 1 



by dUD 

(Rorl) 



>T(r)- 



>T(r) 



lo 

f*(\l r-} 
lo 



36 



26 



by ((28 



> 5 



by (EI 



Due to the memorylessness of the channel and the fact that the bits in the packet contents are i.i.d, the random variables 



Tl (\Iqt~X) , j = 1, 2... are i.i.d. Using the fact that ai 



oo, we conclude 



which implies that 



lim 

l— >oo 



lim qf = lim Pr 

l— >0O /— >00 



Ei=i —\ _ T no {\l r\) 



on + 1 



/o 



, Tl (Rorl) 



a; + 1 



> 5 



0. 



Next we show that C C K. Assume that reCso that there is a sequence of coding algorithms c\ with rate r whose error 
probability approaches zero in the limit as I — > oo. We then construct an evacuation policy 717 for evacuating [~ir] packets as 
follows. 

a) For e > 0, select I so that qf < e. 

b) Follow the steps of q for the first I slots. 

c) If all receivers decoded correctly, leave slot I + 1 empty, thus signaling to all receivers the end of the decoding process, 
e) Else (i.e., if any of the receivers makes an error), send a dummy packet in slot I + 1 (thus informing the receivers that 

decoding continues) and resend all the \lr\ packets using the one-by-one policy ttq- 

Note that, since the transmitter knows gi and, through the received feedback, the sequence received by i, it knows whether 
a receiver makes an error and hence the third step above is implementable. 

We compute the average evacuation time of tti as follows. Let £ be the event that all destinations have decoded the packets 
in I slots. Then, since on £ c it holds 

T„ l (\lr]) = l + T„ (\lr]) + l, 
T no (\lr~\) is independent of £ c , and by choice Pr{£ c } = qf < e, therefore we have 

E[T WI (Rrl)lH =IPy{£ c } + Pt{£ c }(T 1to + 1) 

< U + e T^l + C o + lj ■ 
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Taking into account that T Wl ( \lr~\ ) = I + 1 on £ , 

E [T„ t ( \lr] )] = E [T„, ( \lr] ) 1 £ ] + E ( Rr] ) 

< / + 1 + e N + Ci ^ |><1 + 0) + ll 

Hence, 

i " I 

<l+ 1 _ + e . l±9i EIli W + Co + 1 

Considering the limit as Z — > oo, we obtain, 

f A (r) < 1 + + 

and since e is arbitrary we conclude 

T A (r) < 1. 

■ 

It remains to relate 1Z to 6>d and Sb under the current model. Revisit the proof of Theorem[9] and use a policy ttq <E Sd for 
the first I slots. It all packets are decoded correctly by slot I, leave slot I + 1 empty, thus informing all receivers of successful 
decoding. Else send a dummy packet in slot I + 1 and afterwards apply the One-By-One policy iro as policy in the proof 
to evacuate the remaining packets. With these modifications, the proof can be used to show that 

S D C K. (30) 

We now consider the implementation of the Epoch Based policy ir e under the current model. This policy selects a particular 
evacuation policy for each epoch, which is a function of the number of packets k at the beginning of the epoch. In order 
to implement 7r e in the current model, the receivers must generally know k at the beginning of an epoch. The transfer of 
information about the number k is done by transmitting O E"=i l°g + 1)) packets (for example, using the One-by-One 
policy 7Tq) and hence the average number of slots to achieve this transfer is O log (hi + 1)). This increases the length of 

the evacuation period but since the increase is logarithmic in the number of packets, it does not affect the stability arguments. 
Note also that once an epoch ends, all k packets, as well as the feedback information and the packets stored at the receivers 
can be discarded since they are not used for further decoding by 7r e . Hence we conclude that 

ncs B (3i) 

Taking into account d24l i. d30l l. OTb we finally conclude, 
Theorem 17. It holds, 

C = 1Z = Sn = <5n. 
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Appendix A 
Proof of Theorem[6] 

An analogous to Theorem [6] has been derived in J8) for subadditive functions defined on R™. The extension of Critical 
Evacuation Time Function to Rq given in ( TTOb is not necessarily subadditive and hence we need different arguments to show 
the result, albeit using similar ideas. 

Let / (k) : Nq — > R be a subadditive function. Let U be the set of n-dimensional vectors whose coordinates are either 
zero or one, and define, 



We will need the following lemma. 

Lemma 18. For any k £ Nq — {0} , it holds 



U = max f (u) . 

■u,eu v ' 



f (k) < U maxfcj 



Proof: Assume without loss of generality that for some c < n, < fei < ki < ... < k c and, in case c < n, then 
k c+ i = . , . = k n = 0. Write, 

k= : = (kj - fcj_i)«j, 



where fcn = and 



By subadditivity we have, 



if i > 1 and j = 1, — 1 , 
Uij = ^ 1 if j = i, ...,c 
if j > c 



/(fc)<X)(**-**-i)/(«i) 

i=l 
< [/fc c 

■ 

Next we extend the definition of / (k) to R l by defining 

/(r) = /(rrl),reRS. 

We then have the following theorem. 
Theorem 19. For any r £ Rg, the limit function 

/( r)= l im £M ( 32) 

r— f oo t 

exists, is finite and positively homogeneous. 

Proof: Assume without loss of generality that r\ > T2 > ... > r n . If r\ =0 then r = and d32b is obvious. Assume 
next that for some c, 1 < c < n, r c > and r c+1 = 0. For consistency define r„ +1 = 0. 
Let e > and (3 = liminft-yoo / (tr) Jt > 0. Using Lemma [T8l we have, 

f(tr) _ f([tr]) 
t t 

t 

max, {in} + 1 



< U- 



t 



U ^max{rj + j^j 



Hence, /? < oo. 

To show existence of the limit in d32b . it suffices to show that 



limsup^-^ < /3 + 6(e), (33) 

t— »OC t 



where \im e ^.Q S (e) = 0. 



By definition of f3, there are infinitely many f, such that / (tr) ft < ft + e. Since we also have 



*< — <* + -, 



we can pick to large enough so that the following inequalities hold 

f(tor) 



to 



n<^<n + e, i = i, 

to 

Using Euclidean division, write for i = 1, c 



If c < n, define also, 
We then have, 



h,i = v t ,i = 0, i = c + 1, ...,n 

f (tr) = f (Ctrl) 

< / (h.i [ton] , l t , n \t r n ]) + f (v f ) by subadditivity 



Next, write 



h,i [ton] 



^2 - l t,o-i) v o> 



k.n \tor n ] 

where l tt o = and the ith coordinate of Vj, Vj^, is defined for 1 < j < c as, 

{0 if j 7^ 1 and i = 1, ...j — 1, 

\t ri] if i = j, ...,n 

Notice that since rj > it holds, h,j-i < h j, 1 < j < c. Using subadditivity, we then have from d39l ). 



Hence, 



/(^SOw-J^-i) /(«,•) + /(«,) 



f(tr) < y^(lt,j-k,j-i)tof(v j ) ( /(v t ) 



f 



t 



to 



t 



k,itof(t r) | ^ Qt,i -h,j-i) to f (vj) , /(^t) 



t ^0 



j=2 



to 



By d37j, (ED , takes a finite number of values, hence / (i> t ) is a bounded sequence, and 

lim IM = . 

t-^oo t 

Also, from d34l i. d36t and d37l i we have for 1 < i < c, 

. f*r<l ii,t*o [ton] , u*,t it,i*o / \ , ft.i 

r, < = — < — — (ri + e) H , 

t t to t t y ' t 

, 1 [trj] f M i [ton] , w *i ^ i«*o , v t ,i 

ri H — > = > r« H -, 

t t t to t ~ t t 

hence, using the fact that v t is a bounded sequence, we conclude 

1 < 1 < — - — < lim inf < lim sup — - < 1 . 

r c Ti Ti + 6 t-K» t t 
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Taking into account the latter inequalities and (35[ we have from ( |4"TT >, 

lim sup - < {/3 + e) lim sup _|_ ^ Mim sm3 1m?2 _ nm j n f h± 



t t^oo i ^ V t->oo t t^-oo t 



e f (vi) 
c i=2 u 

< £ + e + ±Uc maXl by LcmmaHlaiid gOj 

</3 + e+ — t/n (n + e) by ((Ml) 

Hence ([33]) holds with (5(e) = e + ^C/n (n + e). 

Positive homogeneity follows immediately since for a > 0, 

ti \ v f( tar ) v f( tar ) it \ 
j (ar) = lim = a lim = af (r) . 

t— >oo t t— too at 

■ 

The next lemma is needed to establish further properties of / (fe) in Theorem |2TI below. 
Lemma 20. Let a subadditive function f (fe) , k G Nq satisfy 

f(k)-f(k + ei ) < D (42) 
77ien the following holds with D = max{/ (ei) , / (e„) , Z?o} ■ 

|/(fe) -/(fc + e;)| <D, for all i= l,..,n. (43) 

n 

\f(k)-f(m)\<Dj2\ki-mi\ (44) 

i=l 
n 

\f (r) -f(s)\< Dj2\n -8i\ + nD (45) 



/(fe + ei )</(fe) + /( ei ) 

f(k + ei )-f (k) < maxT* (e<) = D x 



Proof: By subadditivity, 
hence, 

Taking into account d42b we conclude, 

|/(fc + ei)-/(fe)|<max{Di,X>o}=I> 

which shows (l43t . 

To show d44b we use backward induction on the number c of coordinates of k, m that are equal. If c =- n then clearly d44b 
holds. Let ( f44l > hold for c < n and assume without loss of generality that ki = m*, i = 1, c— 1 and ki ^ nii, i > c, k c > m c . 
We then have 

|/ (fe) - / (m)| = |/ (fe) - / (ki,...k c -i,m c , k c+1 , fc„) + / (fci, ...k c - 1 ,m c ,k c+1 , ...,k n ) - / (m)\ 

< 1/ ( fc ) ~ / (fo-) ...fc c -i,m c , fc c+ i, fc n )| + |/ (mi, ...m c _i,m c , fc c+ i, fc„) - / (m)| 

n 

< |/ (fe) — / (ki, ...k c -i, m c , k c +i, •••> kn)\ + D |fcj — mj| by the inductive hypothesis. 

i=c+l 

Now, write 

k c —m c — 1 

/ (fci, ...fc c _i, m c + i + 1, fc c+ i, fc„) - / (h, ...k c -i, m c + i, k c+1 , k n ) 

k c —rn c — 1 



|/ (fe) - / (fci, ...fc c _i,m c , k c +i, — i fcn) 



< ^ 1/ (fei, ...fc c -i,m c + i + fec+i, fc„) - / (fci, ...fc c _i, m c + i, fc c+ i, fc„) 

i=0 
k c —rn c — 1 

< ^ £> by K33J) 
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and hence, 

n n 

|/ (k) - f (m)| < \ki - mj| = |fo - ™i| since fcj = mi, i = 1, ...,c- 1 

i— c i— 1 

i.e., the inductive hypothesis holds for c — 1 as well. 
Finally, for j45l l, write 

\f(r)-f(s)\ = \f(\r])-f(\s-])\ 

n 

<Dj2\\ri] - Ml by® 

i=l 
n 

< \u - Si | + Dn, since \{n] - \s l ] \ < \n - s 4 | + 1 

The next theorem provides further useful properties of / (r) under condition d42b . 
Theorem 21. /fa subadditive function f (k) , fc G Nq satisfies \42\ , then the limit function 

f {r)=]im lM 

t^oc t 

is subadditive, convex, Lipschitz continuous, i.e., it holds 



/>)-/>) <DY,\n-Si 



Also, by ( l44t 



and for any sequence r t £ Rq smc/i f/iaf 

lim r t — A < oo, 

t— >oo 

if /to/as 

Um /M = ; (A) _ (46) 

t— >oc t 

Proof: To show subadditivity, we proceed as follows. Since for any a, it holds 

\a + b~\ + x — \a] + \b~\ for some x = 0, 1, 2, 

we write 

ft (n + r 2 )l + a; = ftril + [tr 2 l . 

n 

f(\t (r 1 +r 2 )})-f(\t(r 1 +r 2 )}+x)<Dj2^ 

i— 

< 2nD 

f (t (n + r 2 )) - 2nD <f([t(n + r 2 )] + x) 
= /(r*ril + r*r2l) 
</(*ri) + /(*r 2 ) 

Dividing the last inequality by t and taking limits shows that / (ri + r 2 ) < / (ri) + / (7*2). 
Convexity follows easily from positive homogeneity and subadditivity, 

/ (pri + (1 - p)r 2 ) < / (pn) + / ((1 - p)r a ) 
= p/(ri) + (l-p)/(r a ). 

Lipschitz continuity follows easily as well from (l45l l by replacing r, s with ir, is, dividing by t and taking limits. 
Finally let 

lim r t = A < 00 

f — > 00 



Hence, 
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Using d45l l write 



f(trt) 



< 



< 



f(tr t ) f(t\) , f(tX) 



t t 
f(tr t )~f(t\) 



/(A) 



D J2 



t 



n, 



/(A) 



nD 



f(tX) 



/(A) 



Taking limits in the last inequality shows d46l i. ■ 
Theorem [6] will follow directly from Theorems [T9l I2T1 if we verify that the critical evacuation time function satisfies ( l42l >. 
But this follows easily from ((6]) since 

T* (fc) —T*(k + e.,) = max?; (fc) - maxT* (fc + e. ( ) 

s i 

< max {f* (fc) - T* (fc + e;)} 

8 

< A>, by ©. 



Appendix B 
Proof of Theorem[T31 

In the discussion that follows we use the terminology and related results in J9)- For x,y <E Q, if x leads to y, we write 
x ~->- y and if x communicates with y, x <~v+ y. A Markov Chain with countable state space Q is called irreducible if all states 
in Q belong to the same essential class, i.e., all states communicate with each other. 

The proof of stability of the Epoch Based Policy is based on the following theorem, see iflOl . ifTTl . 

Theorem 22. Let {X m }m =1 be a homogeneous, irreducible and aperiodic Markov Chain with countable state space Q. Let 
v{x) be a nonnegative real function defined on the state space (Lyapunov function). If there exists a finite set A C Q such that 

v(x) > e > 0, x G A c = Q - A, 

E [v(X 2 ) \X X = x] < oo, x G A, 



and for some S, 1 > 6 > 0, 



E [v(X 2 ) \Xi = x] < (1 - 6) v(x), x e A c , 

v (X 



(47) 



(48) 



then the Markov Chain is geometrically ergodic (positive recurrent) and E 
distribution of {X m }^°_,. 



< oo, where X has the steady-state 



For the general model under consideration in the current work, irreducibility and aperiodicity may not hold. Hence, we need 
some preparatory work to use Theorem [22] The following lemma will be useful. 

Lemma 23. Let {X m }^ =1 be a homogeneous Markov Chain, not necessarily irreducible and/or aperiodic, 
a) With the notation of Theorem \22\ conditions H47\l and H48\) imply 



E [v(X 2 ) \X 1 = x]<U+(l- S) v(x) for all x G G, 

where U = max xe ^iE [i^A^) \Xi = x] . 

b) Conversely, if v (x) > and there are constants U > 0, 6, 6%, < 6i < S < 1, and a finite set B such that 
and 

——- < 5i for all x G B c , 

v (x) 

then d?3 and d4S]) hold with A*r- B and 5 «- 5 - Si. 

c) If (O holds, then for m > 2, 

U 



[v{X m ) |Xi = x] < - + (1 - 5) m v (x) . 
d 



(49) 

holds 
(50) 

(51) 



Proof: It is clear that d47l i and d48l ) imply d49l . Assume now that d49l ) and d50l l hold. Then clearly d47b is satisfied for all 
x G B. Also, since the following holds for x G B c , 



E[v{X 2 )\X 1 =x] < 1 



U 



v (x) 

< (l-(S-S 1 ))v(x), 
it follows that (l48T > is satisfied for x G B c with 6 4— S — Si. 



v(x) 
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To prove ((5TJ, write 

E[v(X m )\Xi = x] = E[E [v(X m )\X d -x,X 1 = x]] 

< U + (1 - 5) E [i>pT m _i) |Xi = x] by Markov property and (J49 

and hence by induction, 

m— 1 

E [u(X m ) |Xx = x] < f/ £ (1 - Sf + (1 - <S) m « (a) 



i=0 



<^ + (l-<5y%0r). 

■ 

The next lemma states that when (f2Tb holds, the Markov process described in section [V] namely {(T m ,S m )}°° =1 (where 
T m is the duration of the m-th epoch and S m is the system state at the end of the ?n-th epoch) has the drift property described 
in Lemma |231 

Lemma 24. For the Markov process {{T m , S m )} m _ 1 define v((t,s)) = r. If T (A) < 1 then there are U > and 6 > 

E [« ((T 2 , &)) |(Ti, S x ) = (r, «)] < + (1 - 6) v ((r, s)) for all (r, s) e 0, 

and ( 1501 ) is a/so satisfied. 

Proof: Using the definition of v, and the fact that given fci and Si, T 2 is independent of T\, write, 

E[v((T 2 ,S 2 ))\(T 1 ,S 1 ) = (t,s)]=E[T 2 \T 1 =t, 8l = s] 

= E [E [T 2 |T! = r, Sl = s, fc : (r) ] | (T l5 5i) - (r, s) ] 

= E [fs kl ' s (fci (r))] (52) 

We have by construction of n e , 

E [fr" 1 " (fci (t))] < E [f; (fex (r))] + e 

< E [T* (fex (r))] + e. (53) 

Since the arrival process vectors are i.i.d, it holds with probability 1, 

ton *lM = A, 



t— »oc 7" 



and, 



lim = lim 



fci (7 



(54) 



= T (A) by dUD 

We will show at the end of the proof that the sequence T* (fci (r)) /r, r = 1, ... is uniformly integrable, which will imply that 

-f* (fci(r))" 



E[T 2 |Ti =r, s„ =a] ^ .. 
lim sup < lim 



lim 

T— >00 



T* (fci(r)) 



e by ([521), ([53J 
e by uniform intcgrability 



= T (A) + e by fl54 

Therefore, for <5 such that < 5 < 1 — T (A) — e, there exists such that for all pairs (r, s) with r > ra it holds, 

E[T 2 |Ti =t, Si=s] < (I-S)t, 

hence, 



where 



E [T 2 \Ti = t, Si = s] < U + (1 - 5) r, 



U = max E [T 2 |T a = r , Si = s] 

(r,s)G0:T<T a 



Also, ( TSOl l is satisfied since lim T _K X) i> (r) = r = oo. 
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It remains to show that T* (fei (r)) /r, r = 1, 2... is uniformly integrable. Using ((5) we have 

h,i (t) 

Now, we have with probability one, 



0<^(Mr))< CiE , 



Co 



(55) 



lim = Ai 

r — voo 7" 

On the other hand, since the length of an epoch T\ is independent of the arrivals during this epoch, we have 

g [fc M (t)} = hr =x 



Since the nonnegative sequences 



Mil 



1, 2, .., n, t = 1, 2, ... converge both with probability one and in expectation, they 



are uniformly integrable (see Theorem 16.4 in 0). Using this fact, uniform integrability of T* (ki (t)) /t follows from ( 1531 ). 



We next present the main theorem of this section, which shows the stability of policy 7r £ . 
Theorem 25. For any A > such that 



T (A) < 1, 



(56) 



policy 7T C stabilizes the system. 



Proof: The idea of the proof is the following. Assume that the system starts at time t = in system state s, with A(Q) = k 
packets at the inputs. We use the queue occupancy notation of Q1{t), Q1 (t) from Section HVl but we henceforth omit the 
indices ,s and it to simplify the notation. Under 7r e , it will be shown through Theorem [22] that (T56b implies that we can identify 
a state (r a ,s a ) to which the chain (T m , 5' m )^_ 1 returns infinitely often. Define mi, I = 1,...., to be the sequence of epoch 
indices when the Markov chain is in state (r a , s a ). Then, due to the Markov property, the process consisting of the successive 
intervals between the times at which the process (T m , 5' m )^ > =1 returns to state (r a , s a ), i.e., 



L, 



E 



1,2, 



consists of i.i.d. random variables and, as will be seen, 



E[Li] < oo. 



(57) 



(58) 



Hence, the process 



Zq — 2^ Tj-> %i — Zl—i + Li, I > 1, 



constitutes a (delayed) renewal process. 

Observe next that by the operation of 7r e , {Q{Zi)}fl a , the number of packets in system at times Zi, is statistically the same 
as the number of arrivals in a interval of length r a . Since packet arrivals are i.i.d and the operations of the process during the 
interval r a do not depend on these arrivals, {Q{Zi)}^l , consists of i.i.d. random variables with E = Ar a < oo. This, 

and the operation of 7r e imply that the process {Q(t)}^ , is regenerative with respect to {Zi}^l n . Let g be the period of the 
distribution of the cycle length Li. It then follows from Corollary 1.5 p. 128 in |[T2l and ( 1581 that 



1 9-1 1 » ~ 

lim - VPr(Q(a 5 + /3) > q) = lim - V E [l{Q( a g+p)> q }\ 

y 8=0 



9-1 

Y,j=0 1 l {Q(Za+j)>q} 



E 



/3=0 



E[Li 



(59) 



Observe next that the random variables Yj (q) = l{Q(z +j)>g} are decreasing in q, and since Q (Zq + j) are finite, lim g _ ! . 0O l{Q(z +j)>g} 
0. Using the monotone convergence theorem we then have, 



lim E 



Lx-l 

E 1 {Q(Zo+j)>q} 



Lx-l 

li SL ZZ l iQ(Zo+3)>q} 



3=0 



(60) 
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From d59l ), ( f60T > we conclude that 



Since for £ = atg + fit it holds 



we conclude from d6"TT ) that 



5-1 

lim lim V Pr (Q (ag + /3) > g) = 

g— >oo alpha— >oo * — ' 
^ 0=0 



9-1 

Pr (Q (i) > g) < £ Pr (Q (a t5 + /?) ><z) 

/3=0 



lim lim sup Pr (Q (t) > q) = 



(61) 



i.e., policy 7r e is stable. 

To implement the plan outlined above we must show the existence of a state to which the Markov chain returns infinitely 
often, as well as doTt . For this we will use Theorem [22] but because of the generality of the model under consideration, we 
cannot apriori claim irreducibility and aperiodicity in order to apply it directly. Instead, we rely first on Lemma [23] using the 
result of Lemma [24] 

Let C ((t, s)) be the communicating class to which a state (r, s) belongs. We consider two cases as follows. 

a) If C((r, s)) is essential, and (Ti, Si) = (t,s) , we have (T m , S m ) £ C ((i~,s)) m = 1,2,... and the evolution of the 
process with initial condition (r, s) constitutes an irreducible Markov Chain. If this chain is periodic with period d, then the 
process {(Tdk+i, Sdk+i)}^L a is an aperiodic Markov Chain, |9] page 14. For this chain, we can apply Theorem [22] to show 
positive recurrence, as follows. Since by Lemma l24l the process {(T m , 5 m )}^ =1 satisfies (|49| i, it also satisfies ( BTT i. Hence the 
process {(Tdk+i, Sdfc+i)}fcLo sat isfi es (T49l > and since lim T _ > . 00 v ((r, s)) = oo, it also satisfies < [50b - Therefore, by Lemma |231 
we can apply Theorem [22] to {(T^fc+i, 5'dfe+i)}fcLo t0 deduce that it is geometrically ergodic with 



E 



v X 



= E 



T 



< oo. 



(62) 



From the above discussion we conclude that the initial state (r, s) is visited infinitely often, and the successive visit indices 
are of the form mi = dVi + 1, I = 0, 1, 2..., Vi = 0, where Vi, I = 1, ... are integer valued i.i.d. random variables with 



Let now, 



3=1 



E [Vi] < oo 
Tdk+j k = 0, 1, 



The nonnegative process < ifc > is regenerative with respect to {VJ}^ and by the regenerative theorem and 



E 



lim 

k— too 



E 



v-Vi-l f 



Observe next that by ( fSTb and d64l i. 



E 



Vl-1 

E 

m=0 



E[Vi] 



Hence in order to show ( f58l > it suffices to show 



or, by 



E 



Vi-l 
m=0 



< OO, 



E 



lim 



< CX). 



(63) 

(64) 
it holds, 

(65) 
(66) 

(67) 
(68) 
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Notice that by (l64l we have, 



E 



5> 

?n— 



E 

?n— 



T 



E[T„ 



d(fc+l 

= E 

d(fe+l 

* E 





by (H) 



from which (l68l l follows. 

b) Consider next the case where C ((r, s)) is inessential, i.e., there is at least one state y £ Q — C((r, s)) such that for 
x 6 C((r, s)) , x y but y x\ here, with x, y we denote pairs of the form (r, s). Hence there is at least one other 
communicating class reachable from C ((r, s)) . The communicating classes reachable from C ((r, s)) will be either essential 
or inessential. We argue that the process {(T rn , S m )}°°_ 1 will enter an essential class in finite time. Assume the contrary, that 
is, there is a set of sample paths ilj with Pr {O/} > 0, for which the process remains always in some inessential class. Since 
inessential states are nonrecurrent (see Q, Theorem 4, p. 18) the process visits each inessential state only a finite number of 
times. This implies that on ilj, linim^oo T m = oo, and since Pr{f2/} > 0, we conclude that 



E 



lim T m |(Ti, Si) = (r, s) 



Applying next Fatou's Lemma we have 

lim inf E [T m | (Ti , Si) = (r, s) ] > E lim T m \(T u Si) = (r, a) 



= oo, 



which contradicts d5Tb . 

Since the process enters again an essential class in finite time, we can apply the arguments of case a) to complete the proof. 



